On the role of old information in generating readable text: a psychological and computational definition of 'old' and 'new' information in the NOSVO system
نویسنده
چکیده
There are at least two stages of text generation. One is generating the content of the text. The other is generating the language that represents and communicates the content (Thompson 1977). These two stages, though interrelated, have their owta sets of interesting problems and principles. The first stage, generating the semantic content of the text, involves motivating, planning and creating the conceptual and semantic content of a piece of text. Once the semantic representation for a text has been constructed the language of that text can be generated. The second stage, language generation, involves communicating the intent and content of the text Without confusing or misleading the reader. "~his paper will address the second stage only. It is not enough to merely generate text. It is also necessary to generate cohesive text. However a shopping list is cohesive, though not "flowing" text by any means. A set of sentences that are prepositionally related are cohesive though are not necessarily beautiful prose. It is not enough to attend to ellipsis and prouominalization to generate readable prose. We believe that there are other factors which must be attended to to generate prose. The NOSVO system is an attempt to take into account old/new information contrasts (Chafe 1974, 1976) which we believe will help natural language generation systems produce more readable text. 'assumable' as being there" (Prince 19'/8:819)i This is quite important and expands upon Ch~'e :.;ir, ce fbr him the important thing is that the antecede~t mu~t be in the hearer's consciousness, i.e. i-~ the l~earer, s tbcus of attention, while for Prince and LaPolla it need only be appropriate to the situation or in some other way coCperatively assumable, to ho in the fiearer's consciousness. Hajicov~ and Vbrov~t (19811 also takes exception with the terms "given (or old) " or "new" information and suggests that "contextuall!¢ bound" and "contextually non-bound" lexical item would be more appropriate. "contextually bound" and "contextually non-bound" is even more appropriate than "already activated" ~md "newly activated" because it seems to also convey situational appropriateness. However, it seems that Hajicov~t restricts her terminology, as well as her theory of discourse (focus) strueture~ to linguistic antecedents. That is, her "shared stock of knowledge" appears to be closer to, if not completely, linguistic in representation. Thet~f0re, neither her theory or terminology has the power to deal with mt antecedent that is merely inferable or appropriate to a situation. We will use the familiar terms "new/old infolmation" but will define them a little more precisely later in the paper.
منابع مشابه
A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملبررسی مقایسهای مدیریت اطلاعات بالینی بیماریهای روانی ناشی از جنگ در کشورهای منتخب با ایران
Background and Aim: Today, psychological diseases like so many diseases, have an old history. Clinical Information System of psychological diseases resulting from war is a part of the information management system of mental illnesses, due to the management of mental patients from the war. This study is aimed to compare information management of psychological diseases in American, Australia and ...
متن کاملآموزش سواد اطلاعاتی به کودکان 7 تا 11 ساله ایرانی
Purpose: To develop instructional objectives for implementing an information literacy instruction program for Iranian children (7-11 years old) based on the information literacy standards of American Association of School Library (AASL). Methodology: In this research, the following methods were used: a literature review in order to extract the instructional objectives of information literacy b...
متن کاملارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متنکاوی در حوزه یادگیری الکترونیکی
As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...
متن کاملA Sociological Definition and Categorization of Information Ethics
Background and Aim: This paper aims at the analysis of the definitions and categorizations of the realm of “Information Ethics” to criticize assumptions and clarify points of departure for introducing a new definition and categorization. Method: I used documentary research method and conceptual analysis approach. This method and approach is the best fits with the goal of pursuit roots of social...
متن کامل